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We designed a model-based analysis to predict the occurrence of population patterns in distributed spiking 
activity. Using a maximum entropy principle with a Markovian assumption, we obtain a model that accounts 
for both spatial and temporal pairwise correlations among neurons. This model is tested on data generated with 
a Glauber spin-glass system and is shown to correctly predict the occurrence probabilities of spatio-temporal 
patterns significantly better than Ising models taking into account only pairwise correlations. This increase of 
predictability was also observed on experimental data recorded in parietal cortex during slow-wave sleep. This 
approach can also be used to generate surrogates that reproduce the spatial and temporal correlations of a given 
data set. 
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The structure of the cortical activity, and its rele- 
vance to sensory processing or motor planning, are a 
long standing debate lU]. There is a need to describe the 
structure of the spiking activity based on well-defined 
statistical models. To infer the state of the neural net- 
work, a first line of work has tried to model the neural 
activity with Hidden Markov Models 12, S, 0]. Maxi- 
mum entropy models have proved useful for the analy- 
sis of many complex systems (see for example jsll^]))- 
Another line of research has used this approach to de- 
scribe neural activity, focusing on spiking patterns lying 
within one time bin ^ §]. However, the latter is not 
prone to predict the temporal statistics of the neural ac- 
tivity |@]. In the following study, we design a model 
inspired from both lines of research to better describe 
the neural dynamics. This model is a maximum entropy 
model based on the correlation values, and respecting a 
Markovian assumption. Thus it takes into account both 
spatial and temporal correlations. We show its ability 
to describe the spatio-temporal statistics of the activity 
on simple network models and recordings in the mam- 
malian parietal cortex in vivo. 

We consider neurons whose spikes are recorded 
and binned, for a long time period, noted as 
{a{t)} := {a/(f)},=i,...,iv where o,- e {-1;1}. The 
purpose of a statistical model is to describe as 
closely as possible the probability distribution of the 
spatio-temporal patterns, P{{a{t)},{a{t + 1)},...) 
with a limited number of parameters. For that 
purpose, we make a Markovian hypothesis on this 



distribution, and aim at finding the joint distribu- 
tion P({or+';{o'}^) = P({or+i|{o'}^)P({o'}^) 
which maximizes the entropy //({a}^+'; {o'}^) = 
-I{a}.{a'}^'({ar';Kr)ln(P({ar+';{a'r)) 
with the constraints on the first- and second-order 
statistical moments of the activity m, =< a, >, 
Qj =< a,(f)o/(f) > and Cfj =< Oi{t)Oj{t + 1) >, the 
normalisation constraint, and the marginal distribution 
constraint: i:{„,}P({a}^+i; {o'}^) = P({a}^+i). 

By using Lagrange multipliers, and then applying the 
marginal distribution constraint, we find: 

Z({a}) being the conditional partition function , and 
{hi,Jij}^j=i are the Lagrange multipliers corresponding 
to the constraints given by {m;,C,/}J^y^j. 

We assume that the detailed balance is satisfied for a 
stationary distribution Pstat{{o}). Therefore the Marko- 
vian matrix is also time-invariant and satisfies the fol- 
lowing relation 

P({a'}t{a})/'„„,({o}) =P({o}|{o'})P„„({a'}) (2) 

so that: 

P{{a}-{a'})=P{{a'}\{a})PM{<^}) (3) 

exp h^Oi + Lf^.^i yoa-a; + lf,=i Jlj^'i^j) ^ ^ ,^ ^ 

— i i 1 Lp t{,^'\\ 
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We then develop the extensive quantity ln(Z({o'})) 
up to the second order: 

N N 

ln(Z({a'})) = ln(Z,//)-X/'X- '£j';ja'ia'j + 0i8a(i) 
/=i /j=i 

The k-th order terms are k products of J^j. This approx- 
imation is thus vaHd in the weak temporal correlation 
limit. 

Note that the coefficients of this development, 
{h^,J[j}fj^^, can be obtained analytically from (O by 
a straightforward computation. The final form for the 
transition function then becomes: 
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N N 



P{{a}\{a'}) = — exp ^ + L ■'u^i^j+ 
^f^U \/=l /j=l 

N N N ' 



(5) 



Using the detailed balance, the stationary distribution 
is then also restricted to the second order and has the 
generic form: 



Since 



^(llii^""oi+l!tj=iJij'"''-''J 



exp It 1 hf'a'/ + L^^i J^-"ayj 



Pstat (M)= IP({cT}|{o'})P„„,({o'}) (7) 
{a'} 

the parameters {hf"' ,Jf"'}^ are fully determined by 
the nii and Ctj values. 

Numerically, we adopt a slightly different approach, 
which is shown to be equivalent to the approximation 
made above. We maximize separately the entropy of the 
stationary distribution Psiut{{'^}) and the time-invariant 
joint distribution P({o}; {oM-), without the marginaliza- 
tion condition. We obtain (|6]l for Pstat{{<^}), and: 
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(8) 



The transition matrix is then determined by: 

P{{^}\{^'}) = T^^tRF' ^'^^'^^ ^^^^^ ® 
identify h\ = h\ - If"' and J^j = - 7/; 



This model contains seven sets of parameters, 
{hi,lvf"',W:jijJff,Jl^,j}^sl.^,. In order to be equiv- 
alent to the previous model we must apply several con- 
straints which will reduce the number of free param- 
eters. The stationary parameters {hf"' ,Jjj"'}fj=i are 
bound to the others by using the relation (Q as be- 
fore. Then we have to apply a normalization on the 
conditional probability distribution (|5]l to recover the 
marginalization condition, which is a special form of 
(HI with Zgff = Therefore, the parameter set 

{K^JlMj=i is also defined by {h,,Jij,j}j}f.^^ which 
are the only free parameters. This model is thus equiva- 
lent to the previous approximation and allows for more 
tractable numerical treatements. 

To test the modeL we first used a raster generated by 
a Glauber model |10|, whose flip transition probability 
from one time step to the next is 
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1 -0,(f)tanh (^{JfjOj{t)+hjaj 



(t)) 



where Tq is the effective time constant and Jij, hi are 



coupling constants of the neurons a nlW . 

Fitting the model parameters to the corresponding 
nii, Cij and C/; values is a classical Boltzmann machine 



jstat 
'ij ■'ij ■ 



learning problem Ill2ll . We started with an analytical ap- 
proximation of the solution fisl] followed by a gradient 
descent: at each time step, the m\, Cjj and Cjj predicted 
by the model were estimated through a Monte-Carlo al- 
gorithm, compared to the experimental ones, and the 
model parameters were updated according to the dif- 
ference. The algorithm was stopped when the differ- 
ence between the theoretical and experimental values 
was less than 0.005, of the order of the uncertainty on 
the nii and Cij estimations. 

In the following, we compared this model to sim- 
pler versions already used in the literature. The "Ising 
model" has the same description of Pstat{{'^Yl^ but as- 
sumed P({o}, {a'}) = Pv,„,({a})Pv,„,({a'})i3 i (this 
is equivalent to assume Cjj = niiinj). The "independent 
model" assumed no second order interactions: all the 
previous parameters are null but the li f"'. 

To estimate their performance in describing the statis- 
tics of the neural activity, we estimated the occurrence 
probability of several spiking patterns empirically and 
compared it to the ones predicted by each model. Figure 
[U shows the prediction of the three models for the prob- 
ability of patterns with respectively 1, 2 and 3 time bins. 
For 1-bin patterns, the Markov and the Ising model are 
equivalent, and showed a good prediction performance. 
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FIG. 1: (Color online) Performance of the 3 statistical mod- 
els to describe the statistics generated by the Glauber model 
(Xo = 2). For each panel, we compared the probability of sev- 
eral patterns estimated empirically from the raster, and pre- 
dicted by the corresponding model. Each point corresponds to 
a different pattern, picked up in the raster. The point color in- 
dicates the number of spikes in each pattern. The black line in- 
dicates equality, and the dashed curves the 95% confidence in- 
terval for the estimated probability. Each column corresponds 
to one of the three models described earlier. From left to right: 
the Markov, Ising and Independent models (see text). The dif- 
ferent lines correspond to different pattern sizes (from top to 
bottom: 1, 2 and 3 temporal bins in the pattern). 



with most of the points prediction being in the confi- 
dence interval of the estimated probabiUty. For patterns 
with 2 and 3 time bins, the prediction remained satisfy- 
ing for the Markov model, while it is strongly degraded 
for the Ising model. Note that the Ising and indepen- 
dent models give similar performances here, contrary 
to 10, m. Indeed, for a broad range of parameters in 
the Glauber model, the absolute correlation values are 
weak. However, their temporal extent controlled by To 
(see Fig. |2f)), is already sufficient to impair the Ising 
model performance. 

We quantified the fit between the model prediction 
and the experimentally measured statistics by com- 
puting the Jensen-Shannon Divergence: Djs{P,Q) = 
H{0.5{P + Q))-0.5{H{P) +H{Q)) (where H(-) is the 
Shannon entropy) measures the similarity between two 
distributions P and Q ll4ll . Figure|2]\ shows the value of 
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FIG. 2: (Color online) Quantification of the models perfor- 
mance. A: Jensen-Shannon Divergence Djs between the pre- 
diction of the three statistical models, and the probabilities es- 
timated empirically, for different pattern sizes. The raster has 
been generated by the Glauber numerical model with param- 
eter To=1.5. The gray line indicates the value below which 
Djs is not significantly different from zero (p < O.OI, tH). 
B: Quantification with the information ratio h/ln- C: Com- 
parison for 2-bin pattern sizes, for different values of the 
parameter in the Glauber model. D: Auto-correlation of the 
population averaged activity for different Xq- 



Djs for the three models, for different numbers of bins 
in the pattern. This confirmed our previous observation. 
For one bin, the Ising and the Markov model are equiv- 
alent, and performed better than the independent model. 
For two bins or more, the Markov model showed lower 
Djs values than the Ising model and the independent 
model. This prediction performance does not vary sig- 
nificantly with the number of bins. The Markov model 
is thus able to predict the probability of a pattern even 
when it is composed of several bins. It thus describes 
with more accuracy the statistics of the neural activity 
over a large temporal extent. 

The better performance of the Markov model com- 
pared to the Ising model has to be related with the 
shape of the correlation functions: if the temporal cor- 
relation functions can be reduced to a Dirac-like form, 
there should be no difference between the Markov and 
Ising models (case Tq = 1 in Fig. |2]r-D). Above 1, 
the normaUzed difference 8log(Djs) = {\og{Df^"''"" ) - 



log(Dj'^"^'))/log(Dy™'''') quickly increases to reach a 
peak performance of 120% around 2.5, and then slowly 
decreases to a plateau of 46 % improvement from the 
Ising to the Markov model, for To > 10. The Markov 
model thus performs better over a large range of Tq val- 
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ues. From the experimental perspective, the Markov 
model prediction is at best when the ratio between the 
correlation time constant (Fig. |2j3) and the bin size is 
around 2.5, but remains satisfying for larger ratios. 

We also computed the fraction of the ensemble cor- 
relations that was captured by the Markov model, f- = 

where Sk is the entropy when taking into account 
the correlations up to the k-th order 10, [Tal . This mea- 
sures the improvement of the fit from the independent 
model to the Markov model. The value is maximal for 
two time bins, and then decreased (Fig. |1]3), in line 
with the observed difference in Djs between the inde- 
pendent and the Markov model. This Markov model 
is thus able to explain a major part of the higher order 
spatio-temporal statistics. 

Apart from describing the statistics of the activity, 
this model can also be used to generate surrogate rasters 
having the same statistics than the captured ones. For 
that purpose, starting from an initial random pattern, we 
generate at each time step a new pattern according to 
(|5]l. We then compared the statistics of this new raster 
with the original prediction (Fig.[3K). Although the gen- 
erator only used the hi, Jij and J^j coefficients of the 
model, the generated stationary probability is in very 
good agreement with the predicted stationary distribu- 
tion estimated from the original data set, described by 
the hf"' and Jff" ■ This result shows the consistency of 
the model: the transition matrix defined by the hi, Jij 
and jj^j parameters has indeed the stationary distribution 
defined by the hf" and Jfj'" coefficients in Q. 

We then applied the same analysis to the surrogate 
data, to obtain a model of the surrogate statistics. Fig 
[3ji shows that we recover the same predictions than with 
the original analysis. The generator is thus producing a 
surrogate raster congruent with the statistical model. 

We then tested the model on in vivo biological data 
taken from IitIi . composed of 8 simultaneous multi-unit 
recordings in the cat parietal cortex in different sleep 
states (Slow Wave Sleep (SWS) and Rapid Eye Move- 
ment (REM)). For the activity recorded during SWS, 
the performance of the Markov model is significantly 
higher than for the Ising model. For a bin size of 10 
ms, this was the case for different template sizes above 
2 (Fig. |4]\). The improvement was comparable to the 
difference between independent and Ising models. We 
estimated 5log(£)ys), the normalized log-difference be- 
tween the Markov and Ising associated Djs, for differ- 
ent combinations of template and bin sizes. The result 
holds, with Djs in the same order of magnitude, for 
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FIG. 3: (Color online) Tests of the surrogate raster generator. 
A: Comparison between the pattern probabilities in the surro- 
gate raster, and the ones predicted by the model in the original 
analysis, for 1-bin patterns and a Glauber model with Tq = 1 
(Djs — 0.0003). Same representation than in Fig.[T] B: Same 
comparison than A for a Glauber model with = 1.5 (Djs — 
0.0005). C: Comparison between the prediction of the model 
fitted on the original data (Tq = 2), and the prediction fitted 
on the surrogate raster, for 2-bins pattern (Djs — 0.0024). D: 
Same comparison than C for 3-bins patterns {Djs — 0.0024). 



larger bin sizes as long as the pattern length, defined as 
(template size) x (bin size), is below ~120 ms (Fig.|4jZ!). 
To see how the sleep state affects this result, we com- 
pared the 5log{Djs) between the SWS and the REM 
activities (Fig.|4jZ!). For pattern length below ~120 ms, 
while the Markov model outperforms the Ising model 
in describing the SWS activity, the improvement drops 
rapidly for the REM state. For very large pattern lengths 
300 ms), the Markov and Ising models perform 
equally well {5log{Djs) ~ 0) for both states. This faster 
drop of performance is related to the smaller correlation 
time constant in the REM state (Fig.|4l3). This is indeed 
reminiscent of the case To = 1 in the Glauber model (see 
Fig.|2jr), and as a consequence, we observed no signif- 
icant difference between the Ising and Markov models 
for intermediate pattern lengths. On the contrary, the 
SWS state exhibits larger correlation extent (similar to 
To > 1 in Fig. 131!), and shows a persistent difference 
5log(Dys). To futher emphasize this relation, we mea- 
sured the correlation time constant To for both states. We 
then computed 5log(Dys) for different pattern lengths. 
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FIG. 4: (Color online) Test of the models on experimen- 
tal data. A: Jensen-Shannon divergence Djs for the 3 mod- 
els, estimated for the activity of 8 channels in cat parietal cor- 
tex, and for different template sizes. Bin width of 10 ms. B: 
Auto-correlation of the population averaged activity for the 
SWS and REM sleep states. The correlation time constants 
To were estimated by fitting an exponential function. C: Rel- 
ative log-difference 8\og{Djs) between the Markov and Ising 
Djs, compared for the SWS and the REM data. The dotted 
line indicates equality. The different points correspond to dif- 
ferent combinations of template and bin sizes, colour coded 
by the pattern length (template size x bin size). Points with 
black edge correspond to panel B values. D: 8log{Djs) for 
both states and for different pattern lengths, in unit of their 
respective correlation time constant (pattern length)/Xo. 



expressed in unit numbers of their respective correla- 
tion time constant (pattern length)/To- When rescaled, 
both states exhibit the same dependency with the pat- 
tern length (Fig.|4j)). The Markov model is thus suited 
for the analysis of temporally correlated activity for dif- 
ferent data sets and for pattern lengths up to 10 times 
their correlation time constant. 

In conclusion, we have presented a probabilistic 
model which gives an account of the distributed spik- 
ing activity with relatively few parameters, and takes 
into account both spatial and temporal pairwise corre- 
lations. The model still predicts the occurrence prob- 
abiUty of larger temporal patterns, and can be used to 
generate surrogates which mimic the temporal and spa- 
tial correlation structure of the data. It would be inter- 
esting to test it on the specific data that have been used 
to show the failure of the ising model Beyond spik- 



ing assembly activity, other event-based data with long 
enough recordings might be interesting to analyze with 
this model (for example calcium transients lIlSll ). This 
method of analysis will help to tackle fundamental is- 
sues about the structure of the neural activity, like the 
existence of higher order statistics, or the Markovian 
nature of the temporal correlations. It could also im- 
pact on a broad range of areas of physics and biology 
which used maximum entropy models lfl9ll . 
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